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CLASSIFICATION IN POSTURAL STYLE 
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This article contributes to the search for a notion of postural 
O f style, focusing on the issue of classifying subjects in terms of how 

D ■ they maintain posture. Longer term, the hope is to make it possi- 

^0 ' ble to determine on a case by case basis which sensorial information 

is prevalent in postural control, and to improve/adapt protocols for 
functional rehabilitation among those who show deficits in maintain- 
ing posture, typically seniors. Here, we specifically tackle the statisti- 
cal problem of classifying subjects sampled from a two-class popula- 
tion. Each subject (enrolled in a cohort of 54 participants) undergoes 
four experimental protocols which are designed to evaluate potential 
deficits in maintaining posture. These protocols result in four com- 
plex trajectories, from which we can extract four small-dimensional 
summary measures. Because undergoing several protocols can be un- 
pleasant, and sometimes painful, we try to limit the number of pro- 
tocols needed for the classification. Therefore, we first rank the pro- 
tocols by decreasing order of relevance, then we derive four plug-in 
classifiers which involve the best (i.e., more informative), the two 
best, the three best and all four protocols. This two-step procedure 
relies on the cutting-edge methodologies of targeted maximum likeli- 
hood learning (a methodology for robust and efficient inference) and 
super-learning (a machine learning procedure for aggregating vari- 
ous estimation procedures into a single better estimation procedure). 
A simulation study is carried out. The performances of the proce- 
dure applied to the real data set (and evaluated by the leave-one-out 
rule) go as high as an 87% rate of correct classification (47 out of 54 
subjects correctly classified), using only the best protocol. 

1. Introduction. This article contributes to the search for a notion of 
postural style, focusing on the issue of classifying subjects in terms of how 
they maintain posture. 

Posture is fundamental to all activities, including locomotion and pre- 
hension. Posture is the fruit of a dynamic analysis by the brain of visual, 
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proprioceptive and vestibular information. Proprioceptive information stems 
from the ability to sense the position, location, orientation and movement of 
the body and its parts. Vestibular information roughly relates to the sense 
of equilibrium. Every individual develops his/her own preferences according 
to his/her sensorimotor experience. Sometimes, a sole kind of information 
(usually visual) is processed in all situations. Although this kind of process- 
ing may be efficient for maintaining posture in one's usual environment, it 
is likely not adapted to reacting to new or unexpected situations. Such sit- 
uations may result in falling, the consequences of a fall being particularly 
bad in seniors. Longer term, the hope is to make it possible to determine 
on a case by case basis which sensorial information is prevalent in postural 
control, and to improve/adapt protocols for functional rehabilitation among 
those who show deficits in maintaining posture, typically seniors. 

As in earlier studies [Bertrand et al. (2001), Chambaz, Bonan and Vi- 
dal (2009) and references therein], our approach to characterizing postural 
control involves the use of a force-platform. Subjects standing on a force- 
platform are exposed to different perturbations, following different exper- 
imental protocols (or simply protocols in the sequel). The force-platform 
records over time the center-of-pressure of each foot, that is, "the position of 
the global ground reactions forces that accommodates the sway of the body" 
[Newell et al. (1997)]. A protocol is divided into three phases: a first phase 
without perturbation, followed by a second phase with perturbation, fol- 
lowed by a last phase without perturbation. Different kinds of perturbations 
are considered. They can be characterized either as visual, or proprioceptive, 
or vestibular, depending on which sensorial system is perturbed. 

We specifically tackle the statistical problem of classifying subjects sam- 
pled from a two-class population. The first class regroups subjects who do 
not show any deficit in postural control. The second class regroups hemi- 
plegic subjects, who suffer from a proprioceptive deficit. Even though dif- 
ferentiating two subjects from the two groups is relatively easy by visual 
inspection, it is a much more delicate task when relying on some general 
baseline covariates and the trajectories provided by a force-platform. Fur- 
thermore, since undergoing several protocols can be unpleasant, and some- 
times painful (some sensitive subjects have to lie down for 15 minutes in 
order to recover from dizziness after a series of protocols), we also try to 
limit the number of protocols used for classifying. 

Our classification procedure relies on cutting-edge statistical methodolo- 
gies. In particular, we propose a nice preliminary ranking of the four pro- 
tocols (in view of how much we can learn from them on postural control) 
which involves the targeted maximum likelihood methodology [van der Laan 
and Rubin (2006), van der Laan and Rose (2011)], a statistical procedure for 
robust and efficient inference The targeted maximum likelihood methodol- 
ogy relies on the super-learning procedure, a machine learning methodology 
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for aggregating various estimation procedures (or simply estimators) into 
a single better estimation procedure/estimator [van der Laan, Polley and 
Hubbard (2007), van der Laan and Rose (2011)]. In addition to being a key 
element of the targeted maximum likelihood ranking of the protocols, the 
super-learning procedure plays also a crucial role in the construction of our 
classification procedure. 

We show that it is possible to achieve an 87% rate of correct classification 
(47 out of 54 subjects correctly classified; the performance is evaluated by 
the leave-one-out rule), using only the more informative protocol. Our clas- 
sification procedure is easy to generalize (we actually provide an example of 
generalization), so we reasonably hope that even better results are within 
reach (especially considering that more data should soon augment our small 
data set). The interest of the article goes beyond the specific application. 
It nicely illustrates the versatility and power of the targeted maximum like- 
lihood and super-learning methodologies. It also shows that retrieving and 
comparing small-dimensional summary measures from complex trajectories 
may be convenient to classify them. 

The article is organized as follows. In Section 2 we describe the data set 
which is at the core of the study. The classification procedure is formally 
presented in Section 3, and its performances, evaluated by simulations, are 
discussed in Section 4. We report in Section 5 the results obtained by ap- 
plying the latter classification procedure to the real data set. We relegate to 
the supplementary file [Chambaz and Denis (2012)] a self-contained presen- 
tation of the super-learning procedure as it is used here, and the descrip- 
tion of an estimation procedure/estimator that will play a great role in the 
super-learning procedure applied to the construction of our classification 
procedure. 

2. Data description. The data set, collected at the Center for the study 
of sensorimotor functioning (CESEM, Universite Paris Descartes), is de- 
scribed in Section 2.1. We motivate the Introduction of a summarized version 
of each observed trajectory, and present its construction in Section 2.2. 

2.1. Original data set. Each subject undergoes four protocols that are 
designed to evaluate potential deficits in maintaining posture. The specifics 
of the latter protocols are presented in Table 1. Protocols 1 and 2, respec- 
tively, perturb the processing of visual data and proprioceptive information 
by the brain. Protocol 3 cumulates both perturbations. Protocol 4 relies on 
perturbing the processing of vestibular information by the brain through 
a visual stimulation. 

A total of n = 54 subjects are enrolled. For each of them, the age, gender, 
laterality (the preference that most humans show for one side of their body 
over the other), height and weight are collected. Among the 54 subjects, 22 
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Table 1 

Specifics of the four protocols designed to evaluate potential deficits in postural control. 

A protocol is divided into three phases: a first phase without perturbation of the posture is 
followed by a second phase with perturbations, which is followed by a last phase without 
perturbation. Different kinds of perturbations are considered. They can be characterized 

either as visual (closing the eyes), or proprioceptive (muscular stimulation), or vestibular 
(optokinetic stimulation), depending on which sensorial information is perturbed 

Protocol 1st phase (0 — > 15 s) 2nd phase (15 — >■ 50 s) 3rd phase (50 — »■ 70 s) 



no perturbation 



eyes closed 
muscular stimulation 

eyes closed 
muscular stimulation 
optokinetic stimulation 



no perturbation 



are hemiplegic (due to a cerebrovascular accident), and therefore suffer from 
a proprioceptive deficit in postural control. Initial medical examinations 
concluded that the 32 other subjects show no pronounced deficits in postural 
control. We will refer to those subjects as normal subjects. 

For each protocol, the center of pressure of each foot is recorded over 
time. Thus, each protocol results in a trajectory (X t )teT = (I*t, Rt)teT, where 
L t = (Lj,Lj) G R 2 [resp., Rt = (Rl,R^)] gives the position of the center of 
pressure of the left (resp., right) foot on the force-platform at time t, for 
each t in T = {k6 : 1 < k < 2800} where the time-step 5 = 0.025 seconds (the 
protocol lasts 70 seconds). We represent in Figure 1 two such trajectories 
(Xt)teT associated with a normal subject and a hemiplegic subject, both 
undergoing the third protocol (see Table 1). Note that we do not take into 
account the first few seconds of the recording that a generic subject needs 
to reach a stationary behavior. 

Figure 1 confirms the intuition that the structure of a generic trajectory 
(Xt)teT is complicated, and that a mere visual inspection is, at least on this 
example, of little help for differentiating the normal and hemiplegic subjects. 
Although several articles investigate how to model and use such trajectories 
directly [Bertrand et al. (2001), Chambaz, Bonan and Vidal (2009)], we 
rather choose to rely on a summary measure of (Xt)teT instead of relying 
on (X t ) teT . 

2.2. Constructing a summary measure. The summary measure that we 
construct is actually a summary measure of a one-dimensional trajectory 
(Ct)teT that we initially derive from (X t )teT- First, we introduce the trajec- 
tory of barycenters, (B t )teT = {\{L t + Rt))t&T- Second, we evaluate a ref- 
erence position b which is defined as the componentwise median value of 
(Bt)teTn[o,i5] (i- e -i the median value over the first phase of the protocol). 
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Fig. 1. Sequences t\-¥Lt (left) andt^Rt (right) of positions of the center of pressure 
over T of both feet on the force-platform, associated with a normal subject (top) and 
a hemiplegia subject (bottom), who undergo the third protocol (see Table 1). 

Third, we set Ct = \\Bt — 6|| 2 for all t G T, the Euclidean distance between Bt 
and the reference position b, which provides a relevant description of the 
sway of the body during the course of the protocol. We plot in Figure 2 two 
examples of (Ct)teT corresponding to two different protocols undergone by 
a hemiplegic subject. 

Because the most informative features can be found at the start and end of 
the second phase, we use the following finite-dimensional summary measure 




10 20 30 40 50 60 70 10 20 30 40 50 60 70 

lime time 



Fig. 2. Representing the trajectories t h- > Ct over T which correspond to two different 
protocols undergone by a hemiplegic subject (protocol 1 on the left, protocol 3 on the right). 
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Fig. 3. Visual representation of the definition of the finite- dimensional summary mea- 
sure Y of (Xt)t£T ■ The four horizontal segments (solid lines) represent, from left to right, 
the averages C± , , CVT, C% of (Ct)ter over the intervals [10, 15[, ]15,20], [45,50[, 
]50,55]. The three vertical segments (solid lines ending by an arrow) represent, from top 
to bottom, the components Y\, Y2 and Y3 ofY. Two additional vertical lines indicate the 
beginning and ending of the second phase of the considered protocol. 



of (X t ) teT [through {C t )t&\- 

(2.1) Y = (C 1 — C l , C 2 — C 1 ,C 2 — C 2 



where 



er = | E c - c? = l E c t 



fGTn[10,15[ teTn] 15,20] 

^2~ = 5 E Cu ^ 2+= 5 E Ct 

iGTn[45,50[ *GTn]50,55] 

are the averages of Ct computed over the intervals [10, 15[, ]15,20], [45,50[ 
and ]50,55] (i.e., over the last/first 5 seconds before/after the beginning/ 
ending of the second phase of the protocol of interest). We arbitrarily choose 
this 5-second threshold. Note that C 2 - Cf =Y 2 + Y 1: 0£ - Cf = Y 3 + Y 2 , 
C 2 — Cf = Y\ + Y 2 + Y% are linear combinations of the components of Y . We 
refer to Figure 3 for a visual representation of the definition of the summary 
measure Y. 



3. Classification procedure. We describe hereafter our two-step classifi- 
cation procedure. We formally introduce the statistical framework that we 
consider in Section 3.1. The first step of the classification procedure con- 
sists in ranking the protocols from the most to the less informative with 
respect to some criterion; see Section 3.2. The second step consists of the 
classification; see Section 3.3. 

3.1. Statistical framework. The observed data structure O writes as O = 
{W,A,Y 1 ,Y 2 ,Y 2, ,Y i ), where 
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• Ifelx {0,1} 2 xR 2 is the vector of baseline covariates (corresponding 
to initial age, gender, laterality, height and weight, see Section 2.1); 

• A G {0, 1} indicates the subject's class (with convention A = 1 for hemi- 
plegic subjects and A = for normal subjects); 

• for each j G {1,2,3,4}, Y'j G M 3 is the summary measure [as defined 
in (2.1)] associated with the j'th protocol. 

We denote by Pq the true distribution of O. Since we do not know much 
about Pq, we simply see it as an element of the nonparametric set Ai of all 
possible distributions of O. 

We need a criterion to rank the four protocols from the most to the less 
informative in view of the subject's class. To this end, we introduce the 
functional ^ : M -)■ M 12 such that, for any P G M, V(P) = (# j (P))l<j<4, 
where 

* J '(P) = (Ep{E P [Y i j \A = l,W] - E P [Yi\A = 0,W]}) 1 < i < 3 . 

The component ^ (P) is known in the literature as the variable importance 
measure of A on the summary measure Y? controlling for W [van der Laan 
and Rose (2011)]. Under causal assumptions, it can be interpreted as the 
effect of A oxiY? . More generally, we are interested in ^(Po) because the 
further it is from zero, the more knowledge on A we expect to gain from 
the observation of W and the summary measure Y? [i.e., by comparing 
the averages of (Ct)t&T computed over the time intervals corresponding to 
index i; see Section 2.2]. For instance, say that (Pq) > 0: this means that 
(in Po-average) the variation in mean of the mean postures and Cf~ 
of a hemiplegic subject computed before and after the beginning of the 
muscular perturbation is larger than that of a normal subject. In words, the 
postural control of a hemiplegic subject is more affected by the beginning 
of the muscular perturbation than the postural control of a normal subject. 

3.2. Targeted maximum likelihood ranking of the protocols. Our ranking 
of the four protocols relies on testing the null hypotheses 

(P ) = 0," (t, j) G {1, 2, 3} x {1, 2, 3, 4}, 

against their two-sided alternatives. Heuristically, rejecting (Po) = 0" 
tells us that the value of the ith coordinate of the summary measure Y J 
provides helpful information for the sake of determining whether A = or 
A = l. 

We consider tests based on the targeted maximum likelihood methodology 
[van der Laan and Rubin (2006), van der Laan and Rose (2011)]. Because 
presenting a self-contained introduction to the methodology would signifi- 
cantly lengthen the article, we provide below only a very succinct description 
of it. The targeted maximum likelihood methodology relies on the super- 
learning procedure, a machine learning methodology for aggregating various 
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estimators into a single better estimator [van der Laan, Polley and Hub- 
bard (2007), van der Laan and Rose (2011)], based on the cross-validation 
principle. Since super-learning also plays a crucial role in our classification 
procedure (see Section 3.3), and because it is possible to present a relatively 
short self-contained introduction to the construction of a super-learner, we 
propose such an introduction in the supplementary file [Chambaz and Denis 
(2012)]. 

Let O(i), • • • ,Or n ) be n independent copies of O. For each G {1,2,3} x 
{1, 2, 3, 4}, we compute the targeted maximum likelihood estimator (TMLE) 
\E^ n of ^(Po) based on O^, . . . ,0( n ) and an estimator a\ n of its asymp- 
totic standard deviation o\{P§). The methodology applies because is 
a "smooth" parameter. It notably involves the super-learning of the con- 
ditional means QI(Pq)(A, W) = Ep (Y^\A, W) and of the conditional dis- 
tribution g(Po)(A\W) = Po(A\W) (the collection of estimators aggregated 
by super-learning is given in the supplementary file [Chambaz and Denis 
(2012)]). Under some regularity conditions, the estimator \Pj of ^(Pq) 

is consistent when either Q^Pq) or g(Po) is consistently estimated, and it 
satisfies a central limit theorem. In addition, if g(Po) is consistently esti- 
mated by a maximum-likelihood based estimator, then a 3 - is a conservative 
estimator of a\ (Po)- Thus, we can consider in the sequel the test statistics 

r i = ^ f yi ( al1 (*,j)€ {1,2,3} X {1,2,3,4}). 

Now, we rank the four protocols by comparing the 3-dimensional vectors 
of test statistics (T/^T^ n ,T^ n ) for 1 < j < 4. Several criteria for comparing 

the vectors were considered. They all relied on the fact that the larger is |T/ n | 

the less likely the null "^(Po) = 0" is true. Since the results were only 
slightly affected by the criterion, we focus here on a single one. Thus, we 
decide that protocol j is more informative than protocol j' if 

E(^j 2 <E(^j 2 - 

i=l i=l 

This rule is motivated by the fact that, if cr{ n , o"2 n , °3 n are consistent es- 
timators of a{(Po), a^iPo), 0-3(^0)) then Yli=iC^in) 2 asymptotically follows 
the x 2 (3) distribution under H 3 Q : u ^ j (P ) = 0." 

By definition of O and by construction of the TMLE procedure, this rule 
yields almost surely a final ranking of the four protocols from the more to 
the less informative for the sake of determining whether A = or A = 1 . 

3.3. Classifying a new subject. We now build a classifier (j) for deter- 
mining whether A = or A = 1 based on the baseline covariates W and 
summary measures (Y , Y 2 , Y 3 , Y 4 ). To study the influence of the ranking 
on the classification, we actually build four different classifiers (ft 1 , 4> 2 , 4> s , (j) 4 
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which, respectively, use only the best (more informative) protocol, the two 
best, the three best and all four protocols. So is a function of W and of j 
among the four vectors Y 1 ,Y 2 ,Y 3 ,Y 4 . 

Say that J C {1,2,3,4} has J elements. First, we build an estimator 
h£(W,Y*,j G J) of P (A = l\W,Yi,j G J) based on {1) , . . . ,0 (n) , relying 
again on the super-learning methodology (the collection of estimators in- 
volved in the super dearning is given in the supplementary file [Chambaz 
and Denis (2012)]). Second, we define 

4> J (W,Yi, j ej) = l{hn(W,Yi,j G J) > \} 

and decide to classify a new subject with information (W,Y J ,j G J") into 
the group of hemiplegic subjects if 4> J \W,Y^,j G J) = 1 or into the group 
of normal subjects otherwise. 

Thus, the classifier (f> J relies on a plug-in rule, in the sense that the Bayes 
decision rule 1{Pq(A = 1\W,Y^ , j G J) > \} is mimicked by the empirical 
version where one substitutes an estimator of Pq{A = l\W,Y 3 ,j G J) for 
the latter regression function. Such classifiers can converge with fast rates 
under a complexity assumption on the regression function and the so-called 
margin condition [Audibert and Tsybakov (2007)]. 

4. Simulation study. In this section we carry out and report the results 
of a simulation study of the performances of the classification procedure 
described in Section 3. The details of the simulation scheme are presented 
in Section 4.1, and the results are reported and evaluated in Section 4.2. 

4.1. Simulation scheme. Instead of simulating (W, A) and the four com- 
plex trajectories (Xj) t £T, {X 2 )teT, {^t)teT, (^t)teT associated with four 
fictitious protocols, we generate directly (W,^4) and the summary mea- 
sures Y , Y 2 , y 3 , y 4 that one would have derived from the trajectories 
{XDt^T-, (X 2 )t£T, (X'i)teT, (Xf) t( =T- Three different scenarios/probability 
distributions Pq,Pq,Pq are considered. They only differ from each other 
with respect to the conditional distributions giP^), q{Pq), 9{Pq) (see Ta- 
ble 2 for their characterization). 



Table 2 

Characterization of the three conditional distributions g(Po), 
k= 1,2,3, as considered in the simulation scheme 

Wi W-2 Wi Wa 
Scenario Llogit^Po 1 )^ = l\W) = + - ^1 - ^ + W B 

Scenario 2: logit S (P 2 )(A = 1\W) = cos(Wi + W 5 ) + sin(Wi + W 5 ) 

Scenario 3: logit g(Pi)(A = l\W) = Ll0cos(Wi + W 3 )J 

+ v/5cos(Wi + Wsj - L5cos(Wi + W z )\ sin(10cos(Wi + W 3 )) 

oU 
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Table 3 

Conditional means Ql(A,W) ofY? given (A,W), {i,j) € {1,2,3} x {1,2,3,4}, as used in 
the three different scenarios of the simulation scheme 



Fictitious protocol 



Conditional means 



i = i 



J =4 



}\ {A, W) = 2[Asm{W 1 + W 4 ) + (1 - A) cos(Wi + W 5 )] 
)l(AW) 



= 3 



Qi{A,W) 

qUaw) 

Ql(A,W) 



Q\{A,W) 

Qi(A,W) 
Qi(A,W) 



(1 - 6A)X 5 - AX 4 + X 3 - ( 1 - ^ ) X 2 + AX 



where X ■■ 



(1 - 2A)W 5 



Ql{A,W) = Atan(W 4 ) + (1 - A)tan(W s + W1W2) 
Ql{A,W) 



160 



+ 



: — [A+W1+W2 + W3 + W5 + WlW 2 

+ {1-A)W 5 + W 2 W 3 W 4 ] 
: 5[ J 4sin(Wi + Wa) + (1 - A) cos(Wi + W 4 )} 
1_ " 
20 

:Al0g + ^W 3 



A[2W 1 + -W 3 )+(l-A)W 5 



Ql{A, W) = —(X + 7)(X + 2)(X -7)(X- 3) 
45 



where X — 



\(A,W) =Tv[Asin(X)(l2X\ + -J2X - \ 2X\) 



145 + AWi 



+ (l-A)cos(X)(\2X\+^/2X- \2X\ )] 

where X = cos(W 3 + W 4 + W 5 



100 
1 



(2X 3 +X 2 -X-l) 



60 



where X = 

(A + Wi + + W3 + W5) 
W1W3W4. 



AW 2 + W 4 + W 5 
30 



1000 



(1 - A)(Wi + W 3 W 4 ) + AW2W5 



For each k = 1,2,3, an observation O = (W, A, Y 1 , Y 2 , Y 3 , Y A ) drawn 
from Pq meets the following constraints: 

1. W is drawn from a slightly perturbed version of the empirical distribu- 
tion of W as obtained from the original data set (the same for all k = 1,2,3); 

2. conditionally on W, A is drawn from g{P$ )( - |W); 

3. conditionally on (A, W) and for each G {1, 2, 3} x {1, 2, 3, 4}, Y? is 
drawn from the Gaussian distribution with mean Q\ (A, W) (the same for 
all k = 1,2, 3; see Table 3 for the definition of the conditional means) and 
common standard deviation a £ {0.5, 1}. 

Although that may not be clear when looking at Table 2, the difficulty of 
the classification problem should vary from one scenario to the other. When 
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Fig. 4. Visual representation of the three conditional distributions considered in 
the simulation scheme. We plot the empirical cumulative distribution functions of 
{g k (A = 1|W(£)) :£ — 1, . . . ,n} for k = 1 (solid line), k = 2 (dashed line) and k — 3 (dotted 
line), where Wn) , . . . , W(l) are independent copies of W drawn from the marginal distri- 
bution of W under Pq (which does not depend on k), and L = 10 5 . 

using the first conditional distribution ^(P 1 ), the conditional probability of 
A = 1 given W is concentrated around |, as seen in Figure 4 (solid line), 
with P^(g(Pd)(l\W) G [0.48,0.54]) ~ 1. In words, the covariate provides lit- 
tle information for predicting the class A. On the contrary, estimating g(P$ ) 
from the data is easy since logit q(Pq)(A = 1\W) is a simple linear function 
of W. The conditional probabilities of A = 1 given W under giPq) and q{Pq) 
are less concentrated around ^, as seen in Figure 4 (dashed and dotted lines, 
resp.). Thus, the covariates may provide valuable information for predict- 
ing the class. But this time, logit giP^) and logit g(P$) are tricky functions 
ofW. 

Likewise, the family of conditional means Q\ (A, W) of Y? given [A, W) 
that we use in the simulation scheme is meant to cover a variety of situa- 
tions with regard to how difficult it is to estimate each of them and how 
much they tell about the class prediction. Instead of representing the latter 
conditional means, we find it more relevant to provide the reader with the 
values (computed by Monte-Carlo simulations) of 




for (j, k) e {1, 2, 3, 4} x {1, 2, 3} and a G {0.5, 1}; see Table 4. Indeed, nS j (P$) 
should be interpreted as a theoretical counterpart to the criterion Yli=iC^i n ) 2 
In particular, we derive from Table 4 the theoretical ranking of the protocols: 
for every scenario Pq and a G {0.5,1}, the protocols ranked by decreasing 
order of informativeness are protocols 3, 2, 1, 4. 
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Table 4 

Values of S J (Po) for (J,k) € {1,2,3,4} x {1,2,3} and o € {0.5,1}. They notably teach us 
that, for every scenario Pq and a € {0.5, 1}, the protocols ranked by decreasing order of 
informativeness are protocols 3, 2, 1, 4 



Scenario 1 Scenario 2 Scenario 3 



Fictitious protocol 


er = 0.5 


a = 1 


a = 0.5 


er = 1 


a = 0.5 


a = 1 


.7 = 1 


0.14 


0.04 


0.11 


0.03 


0.14 


0.04 


i = 2 


0.86 


0.37 


0.74 


0.31 


0.85 


0.37 


i = 3 


2.94 


1.12 


2.49 


0.93 


2.90 


1.11 


j = 4 


0.06 


0.01 


0.04 


0.01 


0.06 


0.01 



4.2. Leave-one-out evaluation of the performances of the classification 
procedure. We rely on the leave-one-out rule to evaluate the performances 
of the classification procedure. We acknowledge that they usually result in 
overly optimistic error rates. Specifically, we repeat independently B = 100 
times the following steps for k = 1, 2, 3: 

1. Draw independently 0(1,6) , • ■ • , 0(n,6) from Pq, with n = 54; we denote 
by -<4.(^b) the group membership indicator associated with 0(^ ,&) , and by O'^ b ^ 
the observed data structure 0(^,6) deprived of A^) . 

2. For each £e{l,...,n}, 

(a) setS {m = {O iilfi y.£'^£,£'<n}; 

(b) based on Sr^ b \, rank the protocols (see Section 3.2), then build 
four classifiers 4> l ^ b y §\ihy $\tV) ano - §\ib) ( see Section 3.3), 
which, respectively, use only the best (more informative), the two 
best, the three best and all four protocols (thus, 4>^ b ^ is a func- 
tion of the covariate W and of J among the four vectors Y , 
Y\Y\Y±); 

(c) classify 0(^,6) according to the four classifications (f)h b \{0% 6 J , 
$\l,b) (®{l,b) ) ' (^,6))' ^{i,b)^'(l,b))- 

3. Compute Perf b J = ± £2=1 1{% 6) = <t>f m {0' m )} for J = 1,2,3,4. 

From these results, we compute for each J G {1,2,3,4} the mean and stan- 
dard deviation of the sample (Perff , . . . , Perfg). All the standard deviations 
are approximately equal to 5%. Second, for every value of a £ {0.5, 1}, per- 
formance Perf^ actually depends only slightly on J (i.e., on the number 
of protocols taken into account in the classification procedure), without any 
significant difference for j = 1, 2, 3, 4. Third, the latter performances all equal 
approximately 80% when a = 1, and increase to approximately 90% when 
a = 0.5. This increase is the expected illustration of the fact that the larger 
is the variability of the summary measures, the more difficult is the clas- 
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Table 5 

Ranking the four protocols using the entire real data set. We report the realizations of the 
criteria X^fciC^/n) 2 obtained for protocols j= 1,2,3,4. These values teach us that the 
most informative protocol is protocol 3, and that the three next protocols ranked by 
decreasing order of informativeness are protocols 2, 1 and 4 



Protocol 


3 = 3 


J = 2 


J — 1 


j = 4 


Criterion EliC^J 2 


75.51 


33.13 


6.80 


5.53 



sification procedure. On the contrary, it is a little bit surprising that the 
conditional distributions ^(Pq 1 ), g(i-Q ), <7(Pq ) do not affect significantly the 
performances. Anecdotally, the estimated ranking of the protocols always 
coincide with the ranking that we derived from Table 4. 

5. Application to the real data set. We present here the results of the 
classification procedure of Section 3 applied to the real data set. Thus, we 
first rank the protocols from the more to the less informative regarding pos- 
tural control (see Section 5.1); then we construct the four classifiers and rely 
on the leave-one-out rule to evaluate their performances (see Section 5.2). 
A natural extension of the classification procedure is considered and applied 
in Section 5.3, and yields significantly better results. We conclude the article 
with a discussion; see Section 5.4. 

5.1. Targeted maximum likelihood ranking of the protocols over the real 
data set. Hemiplegic subjects are known to be sensitive to muscular stim- 
ulations, and also to tend to compensate for their proprioceptive deficit by 
developing a preference for visual information in order to maintain posture 
[Bonan et al. (1996)]. This suggests that protocols involving muscular and/or 
visual stimulations should rank high. What do the data tell us? 

We derive and report in Table 5 the results of the ranking of the proto- 
cols using the entire data set. Table 5 teaches us that the most informative 
protocol is protocol 3 (visual and muscular stimulations), and that the three 
next protocols ranked by decreasing order of informativeness are protocols 2 
(muscular stimulation), 1 (visual stimulation) and 4 (optokinetic stimula- 
tion). Apparently, protocols 3 and 2 (which have in common that muscular 
stimulations are involved) are highly relevant for differentiating normal and 
hemiplegic subjects based on postural control data. On the contrary (and 
perhaps surprisingly, given the introductory remark), protocols 1 and 4 seem 
to provide significantly less information for the same purpose. 

5.2. Classification procedures applied to the real data set. To evaluate 
the performances of the classification procedure applied to the real data 
set, we carry out steps 2a, 2b, 2c from the leave-one-out rule described 
in Section 4.2, where we substitute the real data set Om, . . . ,Or n ) for the 
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Table 6 

Leave-one-out performances Perf 3 of the classification procedure using the real data set. 
Performance Perf 3 corresponds to the classifier based on J among the four vectors Y , 
Y 2 ,Y 3 ,Y 4 (those associated with the J more informative protocols) and either using all 
estimators (second row) or only two of them (third row) in the super-learner (see 
Appendix A in the supplementary file [Chambaz and Denis (2012)]) 





J= 1 


J- 2 


J = 3 


J = 4 


Perf J (all est.) 
Perf 3 (two est.) 


0.70 (38/54) 
0.74 (40/54) 


0.80 (43/54) 
0.81 (44/54) 


0.74 (40/54) 
0.78 (42/54) 


0.78 (42/54) 
0.85 (46/54) 



simulated one. We actually do it twice. The first time, the super-learning 
methodology involves a large collection of estimators; the second time, we 
justify resorting to a smaller collection (see the supplementary file [Chambaz 
and Denis (2012)]). We report the results in Table 6, where the second and 
third rows, respectively, correspond to the first (larger collection) and second 
(smaller collection) rounds of performance evaluation. 

Consider first the performances of the classification procedure relying on 
the larger collection. The proportion of subjects correctly classified (evalu- 
ated by the leave-one-out rule) equals only 70% (38 out of the 54 subjects 
are correctly classified) when the sole most informative protocol (i.e., pro- 
tocol 3) is exploited. This rate jumps to 80% (43 out of 54 subjects are 
correctly classified) when the two most informative protocols (i.e., proto- 
cols 3 and 2) are exploited. Including one or two of the remaining protocols 
decreases the performances. 

The theoretical properties of the super-learning procedure are asymptotic, 
that is, valid when the sample size n is large, which is not the case in this 
study. Even though this is contradictory to the philosophy of the super- 
learning methodology, it is tempting to reduce the number of estimators 
involved in the super-learning. We therefore keep only two of them, and run 
again steps 2a, 2b, 2c from the leave-one-out rule described in Section 4.2, 
where we substitute the real data set Om , . . . , Ot n ) for the simulated one. 
Results are reported in Table 6 (third row). We obtain better performances: 
for each value of J (i.e., each number of protocols taken into account in the 
classification procedure), the second classifier outperforms the first one. The 
best performance is achieved when all four protocols are used, yielding a rate 
of correct classification equal to 85% (46 out of the 54 subjects are correctly 
classified). This is encouraging, notably because one can reasonably expect 
that performances will be improved on when a larger cohort is available. 

Yet, this is not the end of the story. We have built a general methodology 
that can be easily extended, for instance, by enriching the small-dimensional 
summary measure derived from each complex trajectory. We explore the 
effects of such an extension in the next section. 
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Table 7 

Ranking the four protocols using the entire real data set and the extended 
small-dimensional summary measure of the complex trajectories. We report the 
realizations of the criteria Ei=i(-^/™.) 2 obtained for protocols j= 1,2,3,4. The ranking is 
the same as that derived from Table 5 



Protocol 


3 = 3 


i = 2 


3 = 1 


3=4 


Criterion EtrC^J 2 


83.64 


43.61 


14.92 


12.60 



5.3. Extension. Thus, we enrich the small-dimensional summary mea- 
sure initially defined in Section 2.2. Since it mainly involves distances from 
a reference point, the most natural extension is to add information pertain- 
ing to orientation. Relying on polar coordinates of the trajectory {Bt)t&T 
poses some technical issues. Instead, we propose to fit simple linear models 
y{B t ) = vx(B t ) + u [where x(B t ) and y(B t ) are the abscisse and ordinate 
of B t ] based on the data sets {B t : t e T n [10, 15[}, {B t : t e T n [15, 20[}, 
{Bf.teTn [20,45[}, {Bf.t GTn [45,50[} and {Bf.t eTn [50,55[}, and to 
use the slope estimates as summary measures of an average orientation over 
each time interval. The observed data structure and parameter of interest 
still write as O = (W, A, Y 1 , Y 2 , Y 3 , Y 4 ) and *(P) = (P))i<j<4, but Y^ 
and *ff J (P) now belong to M 8 (and not M 3 anymore). The ranking of the 
protocols now relies on the criterion X}i=iC^/ ra ) 2 > whose definition straight- 
forwardly extends that of the criterion introduced in Section 3.2. The values 
of the criteria are reported in Table 7. The ranking of protocols remains 
unchanged, but the discrepancies between the values for protocol 2, on one 
hand, and for protocols 1 and 4, on the other hand, are smaller. 

We finally apply once again steps 2a, 2b, 2c from the leave-one-out rule 
described in Section 4.2, where we substitute the real data set On), ■ • • , Ot n \ 
for the simulated one, and use either all estimators or only two of them in 
the super-learner. The results are reported in Table 8. 

Table 8 

Leave-one-out performances Perf 3 of the classification procedure using the real data set 

and the extended small-dimensional summary measure of the complex trajectories. 
Performance Perf 3 corresponds to the classifier based on J among the four vectors Y , 
Y 2 ,Y 3 ,Y 4 (those associated with the J more informative protocols) and either using all 
estimators (second row) or only two of them (third row) in the super-learner (see 
Appendix A in the supplementary file [Chambaz and Denis (2012)]) 





J= 1 


J- 1 


J = 3 


J = 4 


Perf J (all est.) 
Perf 3 (two est.) 


0.82 (44/54) 
0.87 (47/54) 


0.80 (43/54) 
0.85 (46/54) 


0.80 (43/54) 
0.80 (43/54) 


0.78 (42/54) 
0.82 (44/54) 
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When we include all estimators in the super-learner, the classification pro- 
cedure that relies on the extended small-dimensional summary measure of 
the complex trajectories outperforms the classification procedure that relies 
on the initial summary measure, for every value of J (i.e., each number of 
protocols taken into account in the classification procedure). The perfor- 
mances are even better when we only include two estimators. Remarkably, 
the best performance is achieved using only the most informative protocol, 
with a proportion of subjects correctly classified (evaluated by the leave- 
one-out rule) equal to 87% (47 out of the 54 subjects are correctly classi- 
fied). 

5.4. Discussion. We conducted a brief simulation study to evaluate the 
performances of the classification procedure. With its three different sce- 
narios [i.e., three conditional distribution g(P^ )] and four trajectories (i.e., 
twelve conditional means Q\), the simulation scheme is far from compre- 
hensive. Rather than extending the simulation study, we discuss here what 
additional scenarios would need to be considered before applying the proce- 
dure more generally. In the same spirit as in Section 4, one should consider 
the following: 

• other conditional distributions g(Po), \q{Pq {A = 1\W) — 1/2| being close 
to with high probability (W strong predictor of ^4) or low probability 
(W weak predictor of A); 

• other conditional means Q\, (i,j) E {1,2,3} x {1,2,3,4}, and standard 
deviation a, {Si(P$) :j = 2, 3,4} having one, two, three or four well- 
separated values. 

A straightforward generalization would consist in allowing the standard de- 
viation of Y? to depend on Furthermore, another approach to simulat- 
ing could be considered, where the trajectories (X^) te T, (Xf )t^T, {Xf)t£T, 
(Xf)t^T would be obtained as realizations of stochastic processes satisfying 
a variety of piecewise stochastic differential equations (SDEs). For instance, 
the same SDE could be used to simulate the trajectory during the first and 
third phases (0 — > 15 s and 50 — > 70 s, without perturbations), and another 
SDE could be used to simulate during the second phase (15 —> 50 s, with per- 
turbations). On top of that, the breaking points could be drawn randomly 
from two symmetric distributions centered at 15 s and 50 s. 

This alternative approach to simulating arose while we were trying to 
quantify in some way how much information is lost when one substitutes 
a summary measure for the original trajectory for the purpose of classify- 
ing. Ultimately such a quantification could permit to elaborate new summary 
measures with minimal information loss. We did not obtain a satisfactory 
answer to this very difficult question. However, we identified important in- 
formation that can be derived from the original trajectory, such as mean 
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orientation, as used in Section 5.3, and empirical breaking points, as evoked 
for the sake of simulating in the previous paragraph, and used for the sake 
of classifying by Denis (2011). 
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SUPPLEMENTARY MATERIAL 

Supplementary file: Supplement to "Classification in postural style" (DOI: 
10.1214/12-AOAS542SUPP; .pdf). We gather in this Supplementary file 
a short and self-contained description of the construction of a super-learner, 
as well as the estimation procedures that we choose to involve for the sake 
of classifying subjects in postural style. One of those estimation procedures, 
a variant of the top-scoring pairs classification procedure, is specifically pre- 
sented. 
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